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Abstract — We consider the problem of recovering a low- 
rank matrix when some of its entries, whose locations are not 
known a priori, are corrupted by errors of arbitrarily large 
magnitude. It has recently been shown that this problem can 
be solved efficiently and effectively by a convex program named 
Principal Component Pursuit (PCP), provided that the fraction of 
corrupted entries and the rank of the matrix are both sufficiently 
small. In this paper, we extend that result to show that the same 
convex program, with a slightly improved weighting parameter, 
exactly recovers the low-rank matrix even if "almost all" of 
its entries are arbitrarily corrupted, provided the signs of the 
errors are random. We corroborate our result with simulations 
on randomly generated matrices and errors. 

I. Introduction 

Low-rank matrix recovery and approximation have been 
extensively studied lately for their great importance in theory 
and practice. Low-rank matrices arise in many real data 
analysis problems when the high-dimensional data of interest 
lie on a low-dimensional linear subspace. This model has 
been extensively and successfully used in many diverse areas, 
including face recognition JT], system identification |2|, and 
information retrieval [3|, just to name a few. 

Principal Component Analysis (PCA) H] is arguably the 
most popular algorithm to compute low-rank approximations 
to a high-dimensional data matrix. Essentially, PCA solves the 
following optimization problem: 

min \\D — L\\ s.t. rank(L) < r, (1) 

where D £ u mxn i s the given data matrix, and || • || denotes 
the matrix spectral norm. The optimal solution to the above 
problem is the best rank-r approximation (in an £ 2 sense) to 
D B). Furthermore, PCA offers the optimal solution when the 
matrix D is corrupted by i.i.d. Gaussian noise. In addition to 
theoretical guarantees, the PCA can be computed stably and 
efficiently via the Singular Value Decomposition (SVD). 

The major drawback of PCA is its brittleness to errors of 
large magnitude, even if such errors affect only a few entries 
of the matrix D. In fact, a single corrupted entry can throw 
the low-rank matrix L estimated by PCA arbitrarily far from 
the true solution. Unfortunately, these kinds of non-Gaussian, 
gross errors and corruptions are prevalent in modern data. For 
example, shadows in a face image corrupt only a small part of 
the image, but the corrupted pixels can be arbitrarily far from 
their true values in magnitude. 



Thus, the problem at hand is to recover a low-rank matrix 
Lq (the principal components) from a corrupted data matrix 

D = L a + S , 

where the entries of So can have arbitrary magnitude. Al- 
though this problem is intractable (NP-hard) to solve under 
general conditions, recent studies have discovered that certain 
convex program can effectively solve this problem under sur- 
prisingly broad conditions. The work of |5), |7J has proposed a 
convex program to recover low-rank matrices when a fraction 
of their entries have been corrupted by errors of arbitrary 
magnitude i.e., when the matrix So is sufficiently sparse. This 
approach, dubbed Principal Component Pursuit (PCP) by [6|, 
suggests solving the following convex optimization problem: 

min + A||£||i s.t. D = L + S, (2) 

where || • ||* and || • ||i denote the matrix nuclear norm (sum of 
singular values) and 1-norm (sum of absolute values of matrix 
entries), respectively, and A > is a weighting parameter. For 
square matrices of size n x n, the main result of [6] can be 
summarized as follows: 

// the singular vectors of Lq are not too coherent 
with the standard basis, and the support of So is 
random, then solving the convex program ([2} with 
A = nT 1 / 2 exactly recovers Lq of rank 0(n/ log 2 n) 
from errors So affecting pn 2 of the entries, where 
p > is a sufficiently small positive constant. 
In this work, we extend the above result to show that under 
the same assumptions, (|2]i recovers low-rank matrices even 
if the fraction of corrupted entries p is arbitrarily close to 
one, provided the signs of the errors are random. Equivalently 
speaking, almost all of the matrix entries can be badly 
corrupted by random errors. The analysis in this paper is a 
nontrivial modification to the arguments of [6] and leads to a 
better estimate of the weighting parameter A that enables this 
dense error-correction performance. We verify our result with 
simulations on randomly generated matrices. 

II. Assumptions and Main Result 

For convenience of notation, we consider square matrices of 
size nxn. The results stated here easily extend to non-square 
matrices. 



Assumption A: Incoherence Model for L . It is clear that 
for some low-rank and sparse pairs (Lq,So), the problem of 
separating M = Lq + Sq into the components that generated it 
is not well-posed, e.g., if Lq is itself a sparse matrix. In both 
matrix completion and matrix recovery, it has proved fruitful 
to restrict attention to matrices whose singular vectors are not 
aligned with the canonical basis. This can be formalized via 
the notion of incoherence introduced in J8|. If Lq = UT,V* 
denotes a reduced singular value decomposition of Lq, with 
U, V € W nxr , and £ e R rxr , then L is ^-incoherent if 

maxi ||J7*ei|| 2 < IMr/n, 
maxj ||T^*e.;|| 2 < IMr/n, 

\\uv*\ 



(3) 



< 



where the e,'s are the canonical basis vectors in K n . Here, 
|| ■ | |oo denotes the matrix oo-norm (maximum absolute value 
of matrix entries). 

Assumption B: Random Signs and Support for So. Simi- 
larly, it is clear that for some very sparse patterns of corruption, 
exact recovery is not possible, e.g., if Sq affects an entire row 
or column of the observation. In (6), such ambiguities are 
avoided by placing a random model on f2 = supp(So), which 
we also adopt. In this model, each entry is included 

in 57 independently with probability p. We say £1 ~ Ber(,o) 
whenever is sampled from the above distribution. We further 
introduce a random model for the signs of Sq: we assume that 
for € fi, sgn((So)ij) is an independent random variable 
taking values ±1 with probability 1/2. Equivalently, under this 
model, if E = sgrj.(S'o), then 

1, w.p. p/2, 



E; 



0. 

-1, 



(4) 



w.p. 1 - p, 
w.p. p/2. 

This error model differs from the one assumed in |(6), in 
which the error signs come from any fixed (even adversarial) 
n x n sign pattern. The stronger assumption that the signs are 
random is necessary for dense error correction. 

Our main result states that under the above assumptions 
and models, PCP corrects large fractions of errors. In fact, 
provided the dimension is high enough and the matrix Lq is 
sufficiently low-rank, p can be any constant less than one: 

Theorem 1 (Dense Error Correction via PCP). Fix any 

p < 1. Suppose that Lq is an n x n matrix of rank r obeying 
Q with incoherence parameter p, and the entries of sign(5o) 
are sampled i.i.d. according to Q. Then as n becomes largaM 
Principal Component Pursuit |2) exactly recovers (Lq,So) 
with high probability, provided 



A 



r < 



C 2 n 



pn p\og n 

where < C\ < 4/5 and C*2 > are certain constants. 



(5) 



In other words, provided the rank of a matrix is of the order 
of n/p\og 2 n, PCP can recover the matrix exactly even when 

1 For p closer to one, the dimension n must be larger; formally, n > no (p) . 
By "high probability", we mean with probability at least 1 — cnP for some 
fixed > 0. 



an arbitrarily large fraction of its entries are corrupted by errors 
of arbitrary magnitude and the locations of the uncorrupted 
entries are unknown. 

Relations to Existing Results. While |6) has proved that 
PCP succeeds, with high probability, in recovering Lq and 



Sq exactly with A 



= T7-V2 



the analysis required that the 



fraction of corrupted entries p is small. The new result shows 
that, with random error signs, PCP succeeds with p arbitrarily 
close to one. This result also suggests using a slightly modified 
weighting parameter A. Although the new A is of the same 
order as rT 1 / 2 , we identify a dependence on p that is crucial 
for correctly recovering Lq when p is large. 

This dense error correction result is not an isolated phe- 
nomenon when dealing with high-dimensional highly corre- 
lated signals. In a sense, this work is inspired by a conceptually 
similar result for recovering sparse signal via l\ minimization 
1 9 1 . To summarize, to recover a sparse signal x from corrupted 
linear measurements: y = Ax + e, one can solve the convex 
program min ||x||i + ||e||i, s.t. y = Ax + e. It has been shown 
in 1 9 1 that if A is sufficiently coherent and x sufficiently sparse, 
the convex program can exactly recover x even if the fraction 
of nonzero entries in e approaches one. 

The result is also similar in spirit to results on matrix 
completion J8), |10|, pT| , which show that under similar 



incoherence assumptions, low-rank matrices can be recovered 
from vanishing fractions of their entries. 

III. Main Ideas of the Proof 

The proof of Theorem [T] follows a similar line of arguments 
presented in |6}, and is based on the idea of constructing a 
dual certificate W whose existence certifies the optimality of 
(Lq,Sq). As in [6 1, the dual certificate is constructed in two 
parts via a combination of the "golfing scheme" of David 
Gross fTT) , and the method of least squares. However, several 
details of the construction must be modified to accommodate 
a large p. 

Before continuing, we fix some notation. Given the compact 
SVD of Lq = UZV*, we let T C W lXn denote the linear 
subspace {UX* + YV* | X, Y e R nxr }. By a slight abuse of 
notation, we also denote by fl the linear subspace of matrices 
whose support is a subset of fl We let Vt and Vn denote the 
projection operators T and SI, respectively. 

The following lemma introduces a dual vector that in turn, 
ensures that (Lq,Sq) is the unique optimal solution to |2]). 

Lemma 1. (Dual Certificate) Assume A < 1 — a and 
H'Pfi'pTll < 1 — e for some a,e € (0,1). Then, (L ,Sq) is 
the unique solution to Q if there is a pair (W, F) obeying 

UV* + W = A (sgn(5 ) + F + VnD) 



< i 

oo — 2' 



with VtW = and \\W\\ < a, V n F = and ||F| 
and \\V n D\\ F < e 2 . 

We prove this lemma in the appendix. Lemma [T] generalizes 
Lemma 2.5 of (6) as follows: 
1) |6| assumes that H'Psi'PtII < 1/2, whereas we only 
require that 1 1 T^o^t 1 1 is bounded away from one. By 



Lemma [2] the former assumption is justified only for 

small values of p (or for small amounts of corruption). 
2) While |6J requires that ||W|| < 1/2, we impose a more 

general bound on \\W\\. We find that a value of a closer 

to 1 gives a better estimate of A. 
For example, by setting a = 9/10, to prove that (Lq,So) * s 
the unique optimal solution to it is sufficient to find a dual 
vector W satisfying 



T T W = 0, 
\\W\\<± 
\\Pn(UV* - 
\\V a ±(UV* 



W - Asgn(5 ))||F 

A 
2 ' 



<Ae 2 , 



(6) 



W)\\. 



< 



assuming that H'PsiT'tI < 1 — e and A < 1/10. 

We construct a dual certificate in two parts, W = W L +W 
using a variation of the golfing scheme fTT) presented in |6j. 
1) Construction of W L using the golfing scheme. The golf- 
ing scheme writes il c = U^Ljflj-, where the fij C 
[n] x [n] are independent Ber(g), with g chosen so that 



(1-9) 



Jo 



The choice of q ensures that indeed 
£1 n*> Ber(p), while the independence of the ilj's allows 
a simple analysis of the following iterative construction: 

Starting with Yq = 0, we iteratively define 

Y 3 = Yj-i + q- x V aj V T {UV* - ^--i), 
and set L 



W 



(7) 



2) Construction of W using least squares. We set 

W s = argmin||Q|| F s.t. VqQ = Asgn(5 ), 

V T Q = 0. 

Since \\VnP T Va\\ = \\VnV T \\ 2 < 1, it is not difficult to 
show that the solution is given by the Neumann series 



W b = XV T . 



E 



(VnV T Vnrsgii(So 



(8) 



In the remainder of this section, we present three lemmas 
that establish the desired main result Theorem Q] The first 
lemma validates the principal assumption of Lemma [T] that 
ll'Pfj'Prll is bounded away from one. The other two lemmas 
collectively prove that the dual certificate W = W L + W 
generated by the procedure outlined above satisfies |6]) with 
high probability, and thereby, prove Theorem [T] by virtue of 
Lemma Q] 

Lemma 2. (Corollary 2.7 in [6]) Suppose that Q ~ Ber(,o) and 
Lq obeys the incoherence model ([3]). Then, with high proba- 
bility, \\VnV T \\ 2 <P + S, provided that l-p> C Q S- 2 ^^ 
for some numerical constant Co > 0. 

This result plays a key role in establishing the following 
two bounds on W L and W , respectively. 

Lemma 3. Assume that Q ~ Ber(p), and HT^T-Vl! < a = 
y/p + S < 1. Set jo = 2[logn]. Then, under the assumptions 
of Theorem [T[ the matrix W L obeys, with high probability, 
(a) \\W L \\ < 1/10, 

2 The value of jo is specified in Lemma [3] 



(b) \\V n {UV* + W L )\\ F < A(l - a) 2 , 

(c) \\V n ±(UV* +W L )\\ QO < I 

The proof of this lemma follows that of Lemma 2.8 of (6| 
exactly - the only difference is that here we need to use tighter 
constants that hold for larger n. The main tools needed are 
bounds on the operator norm of Pq Vt (which follow from 
Lemma [2j, as well as bounds on 



\Q-q~ X V aj VTQ\UI\\Q\\ 



\Q - q-^QWIWQW 



for any fixed nonzero Q (which are given by Lemmas 3.1 
and 3.2 of [6|). These bounds can be invoked thanks to the 
independence between the Qy's in the golfing scheme. We omit 
the details here due to limited space and invite the interested 
reader to consult |(6). 

Lemma 4. Assume that ft ~ Ber(,o), and that the signs of Sq 
are i.i.d. symmetric (and independent of tt). Then, under the 
assumptions of Theorem [l] the matrix W s obeys, with high 
probability, 

(a) HW^II < 8/10, 

(b) \\P n xW £ 



'"Woo < i 



See the appendix for the proof details. The proof of this 
lemma makes heavy use of the randomness in sgn.(S'o), 
and the fact that these signs are independent of O. The 
idea is to first bound the norm of the linear operator 
TZ = V T ± J2k>i(Pn'PT'Pn) k , and then, conditioning on fl, 
we use Hoeff ding's inequality to obtain a tail bound for 
x*lZ(sgn(So))y for any fixed x, y. This extends to a bound on 
\\W b \\ — sup || a: ||<i.||j / ||<i x *7Z( s E n (So))y via a union bound 
across an appropriately chosen net. We state this argument 
formally in the appendix. 

Although the line of argument here is similar to the proof 
of Lemma 2.9 in [6], there are some important differences 
since that work assumed that p (and hence, ||'Pn'pT||) is 
small. Our analysis gives a tighter probabilistic bound for 
II^t-l T,k>l( p n'PT'Pn) k E\\, which in turn yields a better 
estimate of the weighting parameter A as a function of p. 

IV. Simulations 

In this section, we provide simulation results on randomly 
generated matrices to support our main result, and suggest 
potential improvements to the value of A predicted by our 
analysis in this paper. For a given dimension n, rank r, and 
sparsity parameter p, we generate Lq and So as follows: 

1) L = R1R2, where J?i,i?2 € R nxr are random matrices 
whose entries are i.i.d. distributed according to a normal 
distribution with mean zero and variance 100/n. 

2) Sq is a sparse matrix with exactly pn 2 non-zero entries, 
whose support is chosen uniformly at random from all 
possible supports of size pn 2 j^The non-zero entries of 
Sq take value ±1 with probability 1/2. 

3 As argued in Appendix 7.1 of |6|, from the perspective of success of the 
algorithm, this uniform model is essentially equivalent to the Bernoulli model. 
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(a) r = l,Ci = 0.8 (b) r = 1, C\ = 4 

Fig. 1, Dense error correction for varying dimension. Given n, r, and p, we generate Lo = RlR<i as tne P r °duct of two independent n X r i.i.d. 
A^(0, 100/n) matrices, and So is a sparse matrix with pn 2 non-zero entries taking values ±1 with probability 1/2. For each pair (n, p), the plots show the 
fraction of successful recoveries over a total of 10 independent trials. Here, white denotes reliable recovery in all trials, and black denotes failure in all trials, 
with a linear scale for intermediate fractions. 



We use the augmented Lagrange multiplier method (ALM) 
| [T2) to solve Q. This algorithm exhibits good convergence 
behavior, and since its iterations each have the same com- 
plexity as an SVD, it is scalable to reasonably large matrices. 
Let (L, S) be the optimal solution to Q. The recovery is 
considered successful if ^°~^- F < 0.01, i.e., the relative 
error in the recovered low-rank matrix is less than 1%. 

For our first experiment, we fix rank(Lo) = 1- This case 
demonstrates the best possible error correction behavior for 
any given dimension n. We vary n from 400 upto 1600, and 
for each n consider varying p E (0, 1). For each (n,p) pair, 
we choose 



a = Ci- vr^ 



p- 



np 



(9) 



with Ci = 0.8 as suggested by Theorem [T] Figure 1(a) plots 
the fraction of successes across 10 independent trials. Notice 
that the amount of corruption that PCP can handle increases 
monotonically with dimension n. 

We have found that the A given by our analysis is actually 
somewhat pessimistic for moderate n - better error correction 
behavior in relatively low dimensions can be observed by 
choosing A according to |9]), but with a larger constant C\ = 4. 
Figure 1(b) verifies this by repeating the same experiment 
as in Figure 1(a) but with the modified A. Indeed, we see 



larger fractions of error successfully corrected. For instance, 
we observe that for n — 1600, choosing C\ = 0.8 enables 
reliable recovery when upto 35% of the matrix entries are 
corrupted, whereas with C\ — 4, PCP can handle upto 75% 
of corrupted entries. As discussed below, this suggests there is 
still room for improving our bounds, either by tighter analysis 
of the current construction or by constructing dual certificates 
W s of smaller norm. 

V. Discussion 

This work showed that PCP in fact corrects large fractions 
of random errors, provided the matrix to be recovered satisfies 
the incoherence condition and the corruptions are random 
in both sign and support. The fact that a higher value of 
the constant C\ offers better error-correction performance in 
moderate dimensions suggests that the analysis in this work 



can be further strengthened. In our analysis, the value of A 
is essentially determined by the spectral norm of W , it is 
reasonable to believe that dual certificates of smaller spectral 
norm can be constructed by methods other than least squares. 
Finally, while we have stated our results for the case of 
square matrices, similar results can be obtained for non-square 
matrices with minimal modification to the proof. 

Appendix: Proof of LemmaQ]and Lemma|4] 

Proof of Lemma [TJ 

Proof: Let UV* + Wq be a subgradient of the nuclear 
norm at Lo, and sgn(So) + Fo be a subgradient of the l\- 
norm at So- For any feasible solution (Lq + H, So — H) to 

||Z +.Hl|*+A||So-#||i > 

||£o||* + A ||So||i + (UV* + W ,H) - A(sgn(5 ) + F Q ,H) 

Choosing W such that (W ,H) = \\V T ±H\\^ and F such 
that (F , H) =-11^^11^ gives 

||Lo + #||* + A||So-ff||i 
> ||i |U + A||5 ||i + \\V T ^H\\, + A||7V# Hi 
+{UV*-Xsgn(S Q ),H). 

By assumption, UV* - Asgn(S" ) = XF-W + XV n D. Since 
||W|| < a, and H-FH^ < |, we have 

\(UV*-\sga(S ),H)\ 

< a\\V T xH\\ t + f ||7V#lli + X\(V n D,H)\. 

Substituting the above relation, we get 



fllTV^IIi 



||L + #1l* + A||So-#||i 

> ||Lo||,+A||5 ||i + (l-a)||P T xH|| 
-X\(V n D,H)\ 

> ||L ||* + A||5 ||i + (1 - a)\\V T ^H\\, + f ||7V# Hi 
-Ae 2 ||Poff|| F 



4 For instance, F = -sgn(P n ± H) and W = V T ± W, where \\W \\ = 1 
and {W,'P T ± H) = llPyiJ/H,. Such a W exists due to the duality between 
II • II and || • ||.. 



We note that 

\\V a H\\ F < \\VnV T H\\ F + \\PaP T ±H\\ F 

<(1 
<(1 
and, therefore, 

WPnH\\ F 



€)\\H\\ F + \[P T xH\\ F 
e)(\\V sl H\\ F + \\V n ±H\\ F ) 



+ \\T T ±H\ 



< 
< 



In conclusion, we have 



-J-Pt±H\\f 
7ll7V#IU- 



||io + fr||* + A||5 -ir||i 

> ||L ||* + A||So||i + ((1 - a) - Xe) \\V T ±H\\* 
+x(l-(l- e )e)\\V a ±H\\ 1 . 

Because \\Vc{Pt\\ < 1, the intersection of fi n T = {0}, and 
hence, for any nonzero H, at least one of the above terms 
involving H is strictly positive. ■ 
Proof of Lemma |4j 

Proof: 

Proof of (a). Let E = sgn(So). By assumption, the 
distribution of each entry of E is given by Q. Using ([8]) 
we can express W s as: 



W b = XP T ±E + XP 7 



k>l 



For the first term, we have IIPtaWq || < A||-B||. Using 
standard arguments on the norm of a matrix with i.i.d. entries, 



we have \\E\\ < <^Jnp with overwhelming probability |13|. 



For the second term, we set 7Z — V T± ^2k>i(T- > siT- > TP\ 
so Wf = \K(E). Notice that whenever \\VnV T \\ < 1, 



\n\\ = \\r. 



k>l 



(VnV T V n ) k \\ 

< \\P T xV n V T Vu\\ ■ || X)(7WrPn)*|| 

k>0 

= \\P T ±V a ±V T \\ ■ \\V T Vn\\ ■ WWt\ 

< \\V^V T \\ ■ \\VqV t \ 



k>Q 



2k 



(10) 



1 - \\VtVsiW 2 
Consider the two events: 

Si :={\\VnP T \\<^p + 8}, 
£2 ~ {\[P a ±V T \\ < + 
For any fixed 77 > 0, we can choose S(r), p) > 0, such that on 

£1 n £ 2 , j 

\\n<^ + rf)J^- p . (11) 

Since Q ~ Ber(p) and fl c ~ Ber(l — p), by Lemma|2j £\C\£2 
occurs with high probability provided 



r < 5(77, p) 2 min(p, 1 — p)n/ p log n. 



(12) 



Since by assumption r < Cn/p\og 2 n, (12\ holds for 
sufficiently large. 



For any r € (0, 1), let 7V T denote an r-net for §' i_1 of size 
at most (3/r) n (see fl4) Lemma 3.18). Then, it can be shown 
that 

\\n(E)\\= sup (y,n(E)x) < (1-r)- 2 sup (y,H(E)x) 

For a fixed pair (x, y) G A^ T x /V T , we define X(x,y) = 
(y,K(E)x) = (K(yx*),E). Conditional on Cl = supp(E), 
the signs of _E are i.i.d. symmetric and by Hoeffding's in- 
equality, we have 

/ 2t 2 

n\x( X ,y)\>m ) <2e^(- W( -^ F 

Since ||xy*||f = 1, we have \\lZ(xy*)\\ F < \\1Z\\, so 

sup \X(x,y)\ >t\n) <2|iV r | 2 exp -■= 
and for any fixed f2 G £\ n £2 



ft 



In particular, for any C > (1 + 77) (1 - r) 2 y log (~), fi G 

£ 1 n £ 2 , 

P (\\K(E)\\ > C^ p I fi) < exp(-C"n), 



where C"(C) > 0. Since inf 0<r <i (1 - t)~ 2 ^log (3/t) < 
9/4, by an appropriate choice of r and 77 > 0, we have 



1^)11 > 



pn 



<exp(-C"7i)+P((finf 2 ) c ). 



Thus, 



|W*|| < A Vp 



/n < 8/10 



4 V 1-P, 

with high probability, provided n is sufficiently large. 

Proof of (b) follows the proof of Lemma 2.9 (b) of (6). ■ 
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